KSTEST

Overview

The KSTEST function performs the one-sample Kolmogorov-Smirnov (K-S) test, a nonparametric goodness-of-fit test that determines whether a sample comes from a specified theoretical probability distribution. Named after mathematicians Andrey Kolmogorov and Nikolai Smirnov, who developed it in the 1930s, the test is widely used in statistics to validate distributional assumptions.

The K-S test works by comparing the empirical distribution function (EDF) of the sample data against the cumulative distribution function (CDF) of the reference distribution. The test statistic D_n is defined as the supremum (maximum) of the absolute differences between these two functions:

D_n = \sup_x |F_n(x) - F(x)|

where F_n(x) is the empirical distribution function of the sample and F(x) is the CDF of the theoretical distribution being tested. Intuitively, the statistic captures the largest vertical distance between the sample’s step function and the hypothesized smooth distribution curve.

This implementation uses the scipy.stats.kstest function from the SciPy library. The function supports testing against various distributions including normal, uniform, exponential, gamma, and others available in scipy.stats. Three alternative hypotheses are available: two-sided (distributions are not identical), less (the sample CDF is below the theoretical CDF), and greater (the sample CDF is above the theoretical CDF).

The function returns both the K-S statistic and a p-value. A small p-value (typically < 0.05) indicates that the sample likely does not come from the specified distribution. For p-value computation, the function offers several methods: exact uses the exact distribution, asymp uses the asymptotic Kolmogorov distribution, and auto selects the most appropriate method based on sample size.

The K-S test is most effective for continuous distributions and requires no assumptions about the underlying data beyond continuity. However, it tends to be less powerful than specialized tests like the Shapiro-Wilk test or Anderson-Darling test when testing for specific distributions such as normality. For more information, see the Kolmogorov-Smirnov test Wikipedia article.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=KSTEST(rvs, kstest_cdf, kstest_args, kstest_alternative, kstest_method)
  • rvs (list[list], required): Sample data to test against the theoretical distribution
  • kstest_cdf (str, optional, default: “norm”): Name of theoretical distribution to test against
  • kstest_args (list[list], optional, default: null): Parameters for the theoretical distribution (e.g., mean and std for normal)
  • kstest_alternative (str, optional, default: “two-sided”): Defines the null and alternative hypotheses
  • kstest_method (str, optional, default: “auto”): Method for calculating the p-value

Returns (list[list]): 2D list [[statistic, p_value]], or error message string.

Examples

Example 1: Test sample against standard normal distribution

Inputs:

rvs kstest_cdf
0.1 -0.5 norm
0.3 -0.2
0.8 0

Excel formula:

=KSTEST({0.1,-0.5;0.3,-0.2;0.8,0}, "norm")

Expected output:

Result
0.3085 0.5191

Example 2: Test sample against uniform distribution

Inputs:

rvs kstest_cdf
0.1 0.2 uniform
0.3 0.4
0.5 0.6

Excel formula:

=KSTEST({0.1,0.2;0.3,0.4;0.5,0.6}, "uniform")

Expected output:

Result
0.4 0.224

Example 3: Test normal distribution with custom mean and std

Inputs:

rvs kstest_cdf kstest_args
5 5.2 norm 5 0.5
4.8 5.1

Excel formula:

=KSTEST({5,5.2;4.8,5.1}, "norm", {5,0.5})

Expected output:

Result
0.3446 0.6237

Example 4: One-sided test with greater alternative

Inputs:

rvs kstest_cdf kstest_alternative
0.1 0.2 uniform greater
0.3 0.4

Excel formula:

=KSTEST({0.1,0.2;0.3,0.4}, "uniform", "greater")

Expected output:

Result
0.6 0.0337

Example 5: Test exponential distribution with all parameters

Inputs:

rvs kstest_cdf kstest_args kstest_alternative kstest_method
1 2 expon 0 2 two-sided asymp
3 4

Excel formula:

=KSTEST({1,2;3,4}, "expon", {0,2}, "two-sided", "asymp")

Expected output:

Result
0.3935 0.5655

Python Code

from scipy import stats
import math

def kstest(rvs, kstest_cdf='norm', kstest_args=None, kstest_alternative='two-sided', kstest_method='auto'):
    """
    Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        rvs (list[list]): Sample data to test against the theoretical distribution
        kstest_cdf (str, optional): Name of theoretical distribution to test against Valid options: Normal, Uniform, Exponential, Log-Normal, Beta, Gamma, Chi-Square, t, F, Weibull. Default is 'norm'.
        kstest_args (list[list], optional): Parameters for the theoretical distribution (e.g., mean and std for normal) Default is None.
        kstest_alternative (str, optional): Defines the null and alternative hypotheses Valid options: Two-sided, Less, Greater. Default is 'two-sided'.
        kstest_method (str, optional): Method for calculating the p-value Valid options: Auto, Exact, Approx, Asymp. Default is 'auto'.

    Returns:
        list[list]: 2D list [[statistic, p_value]], or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    # Normalize rvs to 2D list
    rvs = to2d(rvs)

    # Validate rvs is a 2D list with at least one row
    if not all(isinstance(row, list) for row in rvs) or len(rvs) < 1:
        return "Invalid input: rvs must be a 2D list with at least one row."

    # Flatten rvs to 1D sample data
    try:
        sample_data = [float(item) for row in rvs for item in row]
    except (TypeError, ValueError):
        return "Invalid input: rvs must contain only numeric values."

    if len(sample_data) < 2:
        return "Invalid input: sample must contain at least two values."

    # Validate kstest_cdf is a string
    if not isinstance(kstest_cdf, str):
        return "Invalid input: kstest_cdf must be a string naming a distribution."

    # Parse distribution parameters if provided
    dist_args = ()
    if kstest_args is not None:
        kstest_args = to2d(kstest_args)
        if not all(isinstance(row, list) for row in kstest_args):
            return "Invalid input: kstest_args must be a 2D list or None."
        try:
            dist_args = tuple(float(item) for row in kstest_args for item in row)
        except (TypeError, ValueError):
            return "Invalid input: kstest_args must contain only numeric values."

    # Validate kstest_alternative
    valid_alternatives = ('two-sided', 'less', 'greater')
    if kstest_alternative not in valid_alternatives:
        return f"Invalid input: kstest_alternative must be one of {valid_alternatives}."

    # Validate kstest_method
    valid_methods = ('auto', 'exact', 'approx', 'asymp')
    if kstest_method not in valid_methods:
        return f"Invalid input: kstest_method must be one of {valid_methods}."

    # Get the distribution from scipy.stats
    try:
        distribution = getattr(stats, kstest_cdf)
    except AttributeError:
        return f"Invalid input: '{kstest_cdf}' is not a recognized distribution in scipy.stats."

    # Call scipy.stats.kstest
    try:
        result = stats.kstest(sample_data, distribution.cdf, args=dist_args, alternative=kstest_alternative, method=kstest_method)
        stat = float(result.statistic)
        pvalue = float(result.pvalue)
    except Exception as e:
        return f"Error in scipy.stats.kstest: {e}"

    # Check for nan/inf
    if math.isnan(stat) or math.isnan(pvalue) or math.isinf(stat) or math.isinf(pvalue):
        return "Invalid result: output contains nan or inf."

    return [[stat, pvalue]]

Online Calculator